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Abstract — Magnetic Resonance Imaging (MRI) is one of tlie 
fields that tlie compressed sensing tiieory is well utilized to 
reduce the scan time significantly leading to faster imaging 
or higher resolution images. It has been shown that a small 
fraction of the overall measurements are sufficient to reconstruct 
images with the combination of compressed sensing and parallel 
imaging. Various reconstruction algorithms has been proposed 
for compressed sensing, among which Augmented Lagrangian 
based methods have been shown to often perform better than 
others for many different applications. In this paper, we propose 
new Augmented Lagrangian based solutions to the compressed 
sensing reconstruction problem with analysis and synthesis prior 
formulations. We also propose a computational method which 
makes use of properties of the sampling pattern to significantly 
improve the speed of the reconstruction for the proposed al- 
gorithms in Cartesian sampled MRI. The proposed algorithms 
are shown to outperform earlier methods especially for the 
case of dynamic MRI for which the transfer function tends 
to be a very large matrix and significantly ill conditioned. 
It is also demonstrated that the proposed algorithm can be 
accelerated much further than other methods in case of a parallel 
implementation with graphics processing units (GPUs). 

I. Introduction 

Compressed sensing and sparse reconstruction methods 
have been popular topics of research especially in the last 
decade. Under certain conditions, such as the data being 
sparse, i.e. with a few non-zero coefficients in a domain that 
is incoherent with measurement domain, compressed sensing 
enables reliable recovery of signals even if they are measured 
at a rate below the Nyquist rate [1]. This stimulated research 
in many different fields in which data acquisition is limited 
due to constraints. 

Medical imaging is one of the application areas that adopted 
compressed sensing principles rather quickly. It is shown that 
in magnetic resonance imaging (MRI), the number of mea- 
surement samples and thus the scan time can be reduced while 
preserving image quality if compressed sensing principles are 
used [2]. This is further improved by parallel imaging with 
algorithms such as SparseSENSE utilizing multiple coils [3]. 
These initial studies with compressed sensing were on still 
images or volumes and spatial total variation (TV) or wavelets 
were used as regularization constraints. Dynamic imaging is 
shown to benefit from compressed sensing even further due 
to the images being significantly correlated along temporal 
dimension and therefore represented with sparse temporal 
transforms. Similarly to the spatial case, temporal TV and 
wavelets are commonly used in MRI [4], [5]. The temporal 



Fourier transform is also utilized with k-t SparseSENSE [6] 
especially with cardiac MRI, due to the cardiac motion being 
periodic therefore sparse with respect to Fourier transform. 
K-t Group Sparse SENSE is introduced by Usman et al. [7] 
which groups pixels with respect to their spatio temporal 
activity to treat static and dynamic regions differently during 
reconstruction. 

Parallel computing has gained a great deal of interest in 
recent years and processors with multiple cores has become 
a standard in commercial computing. With the introduction 
of NVidia CUDA and ATI Stream architectures, it is possible 
to make use of large number of processor cores in graphics 
processing units (GPUs) for general purpose computing. Image 
reconstruction in the field of MRI has been shown to benefit 
from parallel processing and GPUs for real-time reconstruction 
[8], [9]. Regardless of the algorithm used, the reconstruction 
in compressed sensing MRI takes significantly more time 
compared to reconstruction of regular MRI. However real-time 
or fast reconstruction is essential in order for the commercial 
implementations of compressed sensing MRI to be useful in 
the field of medicine. Therefore fast reconstruction algorithms 
that can be implemented in parallel are necessary for com- 
pressed sensing MRI to be commercially viable. 

In all the compressed sensing based approaches to medical 
imaging, the images are reconstructed by enforcing sparsity of 
the signal in a separate transform domain while simultaneously 
having the constraint of consistency with the measurements. 
This is often formulated as a basis pursuit problem and many 
different algorithms has been proposed for the solution such 
as iterative shrinkage based methods (FISTA [10], TwIST 
[11]). The Augmented Lagrangian based methods or mainly 
Alternating Direction Method of Multipliers (ADMM) ap- 
proach [12] which is mostly equivalent to the Split-Bregman 
Method [13] has gained significant popularity especially in 
recent years due to its flexibility, ease of implementation and 
speed. SALSA algorithm which is also based on ADMM, has 
been shown to converge faster than other popular methods 
in various different imaging inverse problems [14] combining 
the ideas in ADMM with other denoising methods. Parallel 
MRI however suffers from slower implementations due to the 
transfer function being large as well as harder to adapt to 
ADMM formulation efficiendy. Recently ADMM approach 
has been formulated for parallel MRI reconstruction [15], 
which mainly deals with still image reconstruction with spatial 
regularization, however the speed of the presented algorithms 



has not been evaluated for dynamic MRI reconstruction with 
temporal regularization, which lead to very large and severely 
ill-conditioned transfer functions. 

In this paper, we present two ADMM based solutions 
to analysis and synthesis prior formulations for compressed 
sensing dynamic parallel MRI reconstruction and propose 
an efficient implementation of the transfer function for the 
proposed ADMM algorithms in order to aid significantly faster 
reconstruction. The proposed method exploits the commonly 
used Cartesian subsampling pattern in MRI to provide faster 
and efficient implementation. It also benefits more from a 
parallel implementation such as in GPUs. It is demonstrated by 
the experiments that the proposed algorithm is faster and can 
be accelerated further than earlier methods when implemented 
with GPUs. 




(a) (b) 

Fig. 1. A subsampling pattern example in dynamic MRI wliere only 1/8 of 
the samples are scanned (represented by white points) (a) Subsampling pattern 
for single frame in spatial Fourier domain (b) Sampled vertical indices along 
time in Dynamic MRI 



II. Problem Formulation 

A. Compressed Sensing and Parallel MRI 

In magnetic resonance imaging, the measurements related 
to the image of a 2D slice of a volume are acquired in the 
spatial Fourier transform domain, which can be represented as 



yt = Fxt + n 
= FhFvXt + n 



(1) 
(2) 



where Fh and Fv are matrices representing Fourier transform 
operation along horizontal and vertical axes and F represents 
the 2D spatial Fourier transform, x^ and yt represent image 
of the 2D slice and corresponding Fourier measurements at 
time t respectively and n represents the random noise in the 
measured samples that is modelled as Gaussian. The equation 
in (1) can easily be expanded for 3D volumes instead of slices, 
but all the derivations in this paper will consider 2D slices for 
simplicity. The vector xt is defined as 



Xt 



Xv,l.t 



1 ^v.i.t 



^l,i.t 



Xji 



A.t 



(3) 



where Xjj^t is the image pixel at horizontal position i, vertical 
position j and at time t of an image of Uh x riy spatial 
resolution and rit frames. 

Parallel MRI uses multiple receiving coils to speed up the 
data acquisition by making use of the redundancy among 
different coils. The measurement made by c-th coil can be 
represented as 



yt,c = FScXt + n 






(4) 


= FhFv(sc0xt) + n 




(5) 
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where 



'J,2,C 



indicates the sensitivity or weight associated 



with pixel Xj^i,t by coil c. The matrix Sc is a diagonal 
matrix that has the vector Sc along its diagonal, while 
represents element by element multiplication of two vectors. 
The sensitivity matrices are acquired after calibration and often 
noise decorrelation so that n has a Gaussian distribution. 



In compressed sensing, only a subset of the measurements 
(often selected randomly) are acquired therefore reducing the 
required scan time in MRI. In Cartesian sampling, all the 
Fourier coefficients along the horizontal axis corresponding 
a specific vertical coordinate are measured sequentially and 
the samples cannot be skipped to save the acquisition time. 
Each new horizontal line however requires a switching of 
magnetic field gradient, which has a minimal switching time 
limit and the total time spent for acquisition is proportional to 
the number of scanned vertical lines, therefore the subsampling 
is usually only along vertical axis in the Fourier domain. In 
dynamic MRI, the subsampled vertical coordinates are also 
changed in time randomly. This process can be represented 
for each coil as 



yt.c = Ht^cXt + n 

= FhMv,tFvScXt + n 
= FhFv.tScXt + n 



H, 



FhFv.tS^ 



(6) 
(7) 
(8) 
(9) 



where yt.c is the vector of the sampled subset of measurements 
for coil c at time t and Fv,t — Mv,tFv is the subsampled 
Fourier transform along the vertical axis which is composed 
of subsampled Fourier Transform matrices Fv,i.t transforming 
along each horizontal coordinate i 



v,l,t 











■ v,nh,t 



(10) 



Fh in (7) represents the Fourier transform along the horizontal 
axis similar to (5), but the size of the matrix might be different 
due to subsampling. Other subsampling patterns are also 
possible using different methods of scanning [2]. A sample 
subsampling pattern for dynamic MRI can be seen in Figure 1 . 
The transfer function in dynamic MRI with parallel coils 
can be represented as 



y = FSx + n 
= Hx + n 



(11) 
(12) 



where y, x, F, S and H can be defined as 
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(13) 



(14) 



(15) 



• FhFv.t 



(16) 



and satisfy H = FS. S and Fj are block diagonal matrices 
that are composed of nt and Uc blocks along the diagonal 
respectively. 

Compressed sensing theory shows that the signal x can be 
recovered from y almost certainly if, 

1) ^x is sufficiently sparse for a known transform \I/ 
(^'igfr = ^vl/' ^ I, .' indicating the conjugate transpose) 

2) H and \I/ are sufficiently incoherent, i.e. the off diagonal 
entries of 'J/H'HNl/' after normalization are sufficiently 
small 

One of the breakthroughs in compressed sensing is that the 
second condition is shown to be satisfied when H is formed 
by random entries, or by a random subset of rows of a full 
rank matrix as in (6) regardless of "if [1], [2]. The first 
condition is also relaxed such that ^ can be an overcomplete 
transform or dictionary or a penalty operator as in case of total 
variation [16]. Under these conditions it is shown that x can 
be recovered almost certainly with the minimization 



w — arg min |lw[[o s.t. |[y - 

w 

X = \I/'w 



H*'wll^ <e 



(17) 
(18) 



in which ||.||o is the Lo-norm (which is in fact a quasi-norm) 
that is the number of non-zero entries of a vector (||x||p = 



i/p 



< e 



^xf 1 for p > 0). The constraint ||y — H\E''w||2 

ensures consistency with the measurements and it is optimum 
for i.i.d. Gaussian noise with standard deviation e. The mini- 
mization in (17) is non-convex and therefore practical methods 
do not guarantee convergence to global minimum. It is shown 
in the sparse reconstruction literature that the minimization of 
ii-norm as in 



arg mm ||w||i s.t. |[y ■ 



H*'wll^ <e 



(19) 
(20) 



will lead to the same solution as in (17) provided that w is 
sufficiently sparse [17]. The implications of this equivalence is 
significant since (19) is a convex optimization problem with 



a global minimum and can be solved very efficiently with 
methods such as Alternating Direction Method of Multipliers 
(AD MM) [18] or the Split-Bregman Method [13]. In order to 
further simplify the minimization, the constraint in (19) can 
be removed to form the unconstrained formulation 



w = arg min A 1 1 w 1 1 1 
X = \&'w 



H*'w 



(21) 

(22) 



A number of fast minimization algorithms exist for the so- 
lution of unconstrained minimization in (21) such as TwlST, 
FISTA and SALSA ([10], [14] and references therein) and the 
result is equivalent to (19) if A is adjusted properly. 

The formulation in (19) is referred to as synthesis prior for- 
mulation and (21) can be used with overcomplete transforms 
such as wavelets or non-orthogonal dictionaries. An alternative 
approach uses an analysis prior formulation with a penalty 
term for regularization as in 



X = argmin A-R(x) + ||y — Hx| 



(23) 



in which i?(x) is the penalty function that is large when x 
has characteristics different from a prior knowledge of x. A 
commonly used example for such penalty functions is the total 
variation (TV) which is defined for ID signals as 



TV(x)^^ 



Xi^l\ 



(24) 



and penalizes the signal gradient. In case of dynamic MRI, 
temporal regularization is often employed and TV can be 
defined along the temporal axis as 



TVt(x)=g||xt-xt_i||i 



I I 

-I 



•• 
•• 

I I 



Xl 



= l|Dtx||i 



(25) 



in which Dt is the temporal difference matrix. Minimization 
of (23) with i?(x) — TVt (x) is possible with algorithms such 
as MFISTA [10] or ADMM based methods [13], [14]. 

B. ADMM Basics 

Alternating Direction Method of Multipliers (ADMM) is 
fast convex minimization algorithm that makes use of variable 
splitting and Augmented Lagrangian to solve various forms of 
objective functions. For a general minimization problem such 

as 

arg min /(z) = /i(z) + /2(Gz) (26) 

z 

which may be non-trivial to directly solve for, ADMM algo- 
rithm solves the equivalent problem 



argmin/(z, v) = /i(z) + /2(v) 

Z.V 

such that Gz = v 



(27) 



Solving for (27) can be much easier than solving for (26) 
if functions /i, /2 and the auxiliary variable v are selected 
properly. The ADMM algorithm that solves for (27) can be 
summarized as iteratively applying 



SALSA proposed in [14] uses an alternative approach and 
minimizes 



z <— 



argmin /i(z) + /i||Gz — v — d| 

z 



V ^ argmin /2(v) + /i||Gz - 

V 

d ^ d - (Gz - v) 



(28) 
(29) 
(30) 



until convergence after initializing v, d and /i [12]. The 
iterative steps (28)-(30) solving (27) is also known as the Split- 
Bregman Method, and is guaranteed to converge given /i > 0, 
however convergence can be very slow unless /i is selected 
properly [13]. The parameter /i can be selected empirically 
for a given problem but more discussion on how to select 
/i theoretically can be found in [13], [12] and the references 
therein. 



III. Related Algorithms 

The ADMM approach has been proposed to solve problems 
of the form (23) before, such as the Split-Bregman Method 
proposed in [13] minimizing 



x= argmin A||v||i + ||y - Hx||| 

X 

such that V = Rx 



(31) 



assuming the penalty function is in the form i?(x) = ||Rx||i. 
This formulation is also proposed in [15] (named as PI) for 
Parallel MRI and can be solved iteratively with the steps 
similar to (28)-(30) 



V ^ argmin A||v||i + /i||Rx — v — djjj 



A 



(32) 



= 6(Rx-d,-) 
X ^argmin ||y — Hx||2 + /a||Rx — v — d||2 

X 

= (/^R'R + H'H)-MH'y + AiR'(v + d)] (33) 
d ^d - (Rx - v) (34) 

where &{.) is the element- wise soft thresolding function 
defined as 



if lal < T 



(35) 



Note that (33) requires (/xR'R + H'H)"i which is not 
invertible unless R and H have disjoint null spaces. In the 
case that it is invertible, a closed form solution is often not 
available especially if R'R 7^ I and H'H / I. For the Parallel 
MRI with a large H, it is suggested in [15] that the term in 
(33) be calculated with few steps of Conjugate Gradient (CG) 
algorithm at each iteration of PI. However the use of CG 
iterations is not preferable due to being slow and not precise 
and can greatly slow down reconstruction when H or R is 
very large. 



X =argmin A||Rv||i + |[y - Hx||2 

X 

such that V = X 
assuming an efficient method exists to solve 

V <— argmin A||Rv||i + /i||v — x — d| 



(36) 



(37) 



which sometimes might be another iterative algorithm such as 
Chambolle's Method used for TV regularization [19]. SALSA 
replaces (^R'R + H'H)-i in (33) with (/^I + H'H)-i which 
is easier to compute and has a closed form solution for 
a variety of applications, but still often relies on iterative 
algorithms to solve for (37). Although SALSA has been shown 
to perform much better than popular methods such as FISTA, 
MFISTA or TwIST, it can be very slow if {fil + UU)^ 
cannot be computed efficiently [14]. 

In addition to the algorithms summarized above which have 
been suggested for generic problems, a second algorithm is 
proposed in [15] (named as P2) specifically for Parallel MRI 
problem proposing more number of auxiliary variables such 
that 

x=argminA||v||i + ||y-Fp||2 (38) 

X 

such that V = Rm, m = x, p = Sx 

where F and S are subsampled Fourier transform and sen- 
sitivity matrices respectively as defined in (15) and (16) and 
satisfy H — FS. The ADMM algorithm solving (38) requires 
calculation of the terms (^I + R'R)"\ (^I + F'F)"i and 
(/il + S'S)~i each of which has a closed form solution and 
can be calculated fast and accurately at each iteration. However 
although each iteration of the algorithm solving (38) is quite 
fast, the number of iterations needed for convergence is much 
larger than (31) or (36) due to having too many auxiliary 
variables, and the reported speed is comparable to or worse 
than (31) with CG iterations [15]. 

IV. Proposed Methods 

Considering the drawbacks of the algorithms mentioned in 
Section III, we propose two algorithms to solve synthesis and 
analysis prior formulations respectively, which do not depend 
on any iterative steps except the iterations of ADMM and 
make use of the fast computation of {^I + H'H)^i proposed 
in Section IV-C. 

A. ADMM for Synthesis Prior 

Basic variable splitting approach in (27) can be used to solve 
(21) by 

w =argminA||v||i + ||y-H*'w|l2 (39) 

such that V = w 
i =* w (40) 

which in turn can be minimized with iterations similar to (32)- 
(34) except x is replaced with w, R with I and H with H\I/'. 



Algorithm 1 ADMM minimization to solve (21) 
1: procedure ADMM_Synthesis(y , H, *, A) 



Minimizes A||v||i 



H^'x 



s.t w — 'v,:>i. — \I/'w 



Algorithm 2 ADMM minimization to solve (23) 
1: procedure ADMM_Analysis(y , H, R, A) 

Minimizes A|jv||i + ||y — Hx|J2 s.t v = Rm, m = x 



Set /x> 0, d = 0,w = *H'y 

while convergence criteria not met do 

V <— argm.in A||v||i + /i||w — v — d 



6(w-d,A) 



w ■(— argnim ||y 

w 

1 



H*'w||^ 



/i||W — V - 

(I - **') [*H'y + n{-v + d)] 



dill 



+*(^I + H'H)-i [H'y + ^*'(v + d)] 

d <— d — (w — v) 
end while 
return x = S&'w 
end procedure 



The synthesis prior formulation is often considered as the same 
problem with (31) or (36) with a different transfer function 
H^', and can be solved with the same algorithms. However 
due to the change of transfer function, (/il + ^l/H'H^I'')^^ 
is needed instead of (/il + H'H)^^ which does not have any 
closed form solution for an arbitrary ^ and can be tedious to 
compute numerically. Assuming SI/ is a tight frame satisfying 
\1/'\1/ = 1, we propose to use 

(/il + *H'H*')^ =-I *H'(^I + HH')^H*' (41) 
A* 

= -I *(-I (;ul + H'H)^)*' 

(42) 

= -\ -**' + *(^I + H'H) 1*' 

(43) 

by making use of the matrix inversion lemma. Therefore the 
overall speed of the algorithm depends on the fast computation 
of (^I + H'H)-i. The algorithm solving (21) with ADMM 
with 'J/ being a tight frame is summarized in Algorithm 1. 

Note that the term ( —I \1/\I/' ) vanish when \1/ is an 

orthonormal transform and the analysis and synthesis prior 
formulations become equivalent as are the steps of the algo- 
rithms solving (31), (36) and (39). But when an overcomplete 
transform satisfying '$'''4' = I is used and a fast computation 
of (/^I + H'H)^^ is available. Algorithm 1 can be much faster 
than synthesis prior minimization algorithms such as FISTA, 
or other ADMM based algorithms such as SALSA that uses 
(^I + *H'H*') 1. 



B. ADMM for Analysis Prior 

A more challenging problem is the analysis prior formula- 
tion in (23) which can be approached similar to synthesis prior 



2: Set /il, ^2 > 0, di, d2 = 0, X = H'y, m = x 
3: while convergence criteria not met do 

4: V <— argmin A||v||i + /ii||v — Rm — di||| 

= 6(Rm + di,2^) 
5: m <— argmin /ii||v — Rm — di||| 





"> +/.2||m-x-d2||i 




= (^i + r'r)"' [R'(v-di) 




6: X ^ argmin y - Hx | + /i2 m - x - d2 2 


= i^i2l + H'H)-i [H'y + Ai2(m - d2)] 


7 

8 

9 

10 

11 


di ^ di (v Rm) 
d2 <— d2 — (m — x) 

end while 

return x 
end procedure 


formulation to minimize 




X =argminA V 1 + y — Hx 2 


(44) 



such that V = Rm, m = x 

Algorithm solving (44) is presented in Algorithm 2 and can be 
derived the same way as Algorithm 1. The auxiliary variable 
m in (44) essentially decouples the regularization and data 
fidelity as can be seen in lines 5 and 6 of Algorithm 2, 

and the inversions (^I + R'Rj and (p2l + H'H)"^ are 
guaranteed to exist. 

In order for each iteration of Algorithm 2 to be fast, the 

terms f^I + R'r") (Hne 5) and (/iai + H'H)"! (line 6) 
must be calculated efficiently and if possible precisely. Closed 

form solution to (— I + R'R) exist for many common 
R and it can otherwise be calculated with few Conjugate 
Gradient (CG) iterations due to R'R often having only 
few zero eigenvalues. Therefore the computation speed of 
(/i2l + H'H)^^ is the main factor determining the speed of 
each iteration of Algorithm 2. Note that when compared to 
traditional ADMM approach in (31), Algorithm 2 can be much 
faster when (/i2l + H'H)^^ can be quickly computed or has 
a closed form solution unlike (/iR'R + H'H)^^, but it will 
be slower when R'R = I due to the extra variable separation. 
When compared to SALSA, the extra variable separation 
decoupling the data fidelity and regularization also provides 
advantage for regularization terms such as TV, removing the 
need for other iterative algorithms for TV minimization. The 
iterations of Algorithm 2 will also converge faster than P2 
algorithm in [15] due to having fewer auxiliary variables with- 
out introducing additional computational complexity assuming 



computation of (/i2l + H'H) ^ can be fast and accurate. 



V. Experimental Results 



C. Fast Computation of (pl + H'tl) ^ for Cartesian Sampled 
Parallel MRI 

Both Algorithms 1 and 2 rely on fast and efficient com- 
putation of {^1 + H'H)^^ especially for a large H. We 
propose an efficient and exact calculation by making use of 
the separability of the transfer function among the horizontal 
and vertical axes in Cartesian sampled Parallel MRI. Deriving 
H'H from (14) and (9) and using Fj^Fh = I we have 





■ H'lHi 




H'H = 


••• H^jH„j 


(45) 


fi-c 






(46) 


= >JSX,tFkFhF.,,S, 

c=l 


(47) 


= £;[(s,s'J0f;,,f.,,] 

c=l 


(48) 


= ( 


Tie 


(49) 



Note from (49) that H'H is disjoint along the horizontal axis 
as well as the time axis. Therefore it is possible to have 
singular value decomposition (SVD) of H'H — UAU' as 
Uh X rit separate SVDs of n^ x n^ matrices such that 



U 



A = 



Ui 



Ai 









u. 



,u, 



,At = 



Uv,l,* 



Av4,t 






(50) 




Uv,i,tAv,i,tU.^, j J — (F.^, j jFv,i,t) 2_^(sv,i,cSv,i 



(51) 
(52) 



which is significantly easier to compute than non-separable 
case. Assuming we have the SVD of H'H = UAU' where 
A is a diagonal matrix with eigenvalues {et} on main diagonal 
and UU' = U'U = I, 

(^I + H'H)"i = (/il + UAU')"^ = U(mI + A)"^U' 

(53) 

= UA^U' (54) 

with A^ a diagonal matrix with the diagonal elements 
where e^ are the eigenvalues of A. In addition 



to being significantly faster, the computation of SVD and 
[jjl + H'H)^^ can easily be implemented in parallel since 
the equations are disjoint for each column of x at time t, 

Xv,i,t (52). 



Experiments are conducted on a 2D cardiac MRI dataset 
acquired on a 3T Siemens Trio scanner using a 32-coil matrix 
body array. Fully sampled data were acquired using a 128 x 128 
matrix (FOV = 320 x 320 mm) and 22 temporal frames. 
The fully sampled data is then retrospectively undersampled 
by a factor of 8 using a different variable density random 
undersampling along vertical axis for each time point as shown 
in Figure 1. The data is normalized so that the reconstruction 
of the fully sampled data leads to images with maximum 
intensity of 1 . The sample frames from reconstruction of the 
fully sampled data can be seen in Figure 2. The subsampled 
data is reconstructed with Algorithm 1 and Algorithm 2 using 
the SVD method to compute the term (/xl + H'H)^^ and 
setting temporal DFT as '^ and temporal TV as R respectively. 
The same A value that results in good quality reconstruction 
is used for all the tested algorithms (A = 0.002). 

The reconstruction performance with formulations in (31) 
and (38) are also presented as Pl(ri) and P2 respectively 
where n represents the number of conjugate gradient iterations 
used for PI. The DFT is selected mainly for the purpose of 
comparing Algorithm 1 to PI, and P2 for which the analysis 
and synthesis prior problems become equivalent and the draw- 
backs of using CG steps are more apparent. SALSA algorithm 
with the proposed fast computation of (/il + H'H)^^ is used 
for TV regularization only, due to Algorithm 1 and SALSA 
also being equivalent for orthonormal DFT regularization. 
The reconstruction with algorithms FISTA (for DFT) and 
MFISTA (for TV) are provided for comparison to non-ADMM 
methods, jii for each algorithm is selected empirically through 
simulations to yield for fastest convergence of each algorithm. 
In our experiments, selecting ji in Algorithm 1 (and /i2 for 
Algorithm 2) as 0.06 and the ratio ^ in Algorithm 2 as 0.5 
or 0.25 resulted in sufficiently fast convergence. The inner 
iterations for MFISTA algorithm as well as for SALSA are 
similarly selected as 10. 

All algorithms are implemented in MATLAB© R2010b 
[20] by the authors and the simulations are performed on 
a MacPro5,l computer with 2.8 GHz quad-core Intel Xeon 
central processing unit (CPU), 6 gigabytes of memory and 
MacOSX 10.6.8 operating system. Apart from the simulations 
on the CPU, each algorithm is also accelerated on GPU with 
the help of JACKET© 2.0 toolbox for MATLAB [21] and 
NVidia Quadro FX 4800 graphics card with 1.5 gigabytes 
of memory and 192 CUDA processor cores to demonstrate 
how much further each algorithm can be accelerated with the 
help of parallel processing. JACKET© 2.0 toolbox provides 
built in functions to perform almost all operations on GPU 
which can be as simple as array or matrix multiplication or 
more complicated such as Fast Fourier Transform or SVD. It 
also provides the option to perform loops with independent 
iterations in parallel therefore enabling parallel processing of 
sequential tasks. All the tested algorithms are implemented in 
order to utilize GPU as efficient as possible, keeping the steps 
of the algorithms that does not benefit from GPU acceleration 
on CPU in order to maximize available memory in the graphics 
card. Each simulation is initialized with Xiniuai = H'y 




Fig. 2. Frames 1, 4, 7, 10, 12 of the cine dataset 
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Fig. 3. The objective function for each algorithm with temporal DFT 
regularization 



and performed with 200 iterations for 10 times on minimum 
operating system load and the fastest among the 10 is reported 
for the time measure in the results. 

The convergence speed of FISTA, Algorithm 1 and PI for 
temporal DFT regularization can be seen in Figure 3. For an 



orthonormal transform. Algorithm 1 and PI have basically the 
same update on the reconstruction at each iteration, however 
the difference in their speed is significant mainly due to 
CG iterations vs SVD based computation. For a severely ill 
conditioned and large H as in the experiment, the inaccuracy 
of CG iterations results in a major impact on the speed of the 
algorithm. Even when the algorithms are accelerated on GPU, 
PI is unable to achieve the speed of CPU implementations 
of Algorithm 1 or FISTA as observed in Figure 5a. P2 
algorithm perform better than PI for CPU simulations whereas 
PI is accelerated with GPU implementation much better than 
P2. This is expected since execution of each step of P2 
algorithm is already fast and cannot be accelerated by GPU 
implementation as much as the other algorithms therefore the 
slow convergence of P2 algorithm becomes apparent when 
computational drawbacks are reduced by GPU computation. 

In our study we use the ratio of reduction in the objective 
function as the convergence criterion which is defined as 



A(fc) = 



A Jik 



l)-J(fc) 



Jik) 



(55) 



where J{k) is the value of the minimized objective function 
at iteration k. The time it takes for each algorithm to reach 
A(fc) = 0.001 is shown in Figure 5a. It can be seen from 
Figure 5a that Algorithm 1 with SVD decomposition is already 
fastest when executed on CPU, and is accelerated multiple 
times more than the other algorithms when GPU is utilized. 
For the analysis prior formulation with temporal TV, all the 
algorithms take longer time compared to DFT regularization 
case due to R being ill conditioned as well as H. The 
performance of the PI algorithm can be seen to be affected 
by this the most since the term (/iR'R + H'H)^^ becomes 
much harder to estimate. It can be seen from Figure 4 that 
the algorithm even fails to converge unless the number of 
CG iterations are kept higher than 10. Although the GPU 
acceleration improves the performance, the results are far from 
the performance of MFISTA or Algorithm 2. The speed of 
P2 algorithm is close to PI for CPU simulation but falls 
behind during simulations with GPU similar to the case with 
DFT regularization. SALSA performs better than all other 
compared algorithms but still falls behind Algorithm 2 due to 
having multiple iterations of Chambolle's Algorithm for TV 
minimization at every step. The converged objective function 
value is also slightly higher than MFISTA and Algorithm 2 
due to inaccuracy of the TV estimation with few iterations. 
The convergence speed of the algorithms can be observed in 
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Fig. 4. The objective function for each algorithm with temporal TV 
regularization 



Figure 5b. 

It can be observed from Figures 3 and 4 that the per- 
formance of both PI and P2 are worse than FISTA and 
MFISTA unlike the reported performance in [15]. The high 
undersampling ratio in dynamic MRI applications lead to a 
challenging reconstruction problem with slow convergence un- 
less the optimization algorithms are fast and accurate, and PI 
and P2 are severely affected by H. Both FISTA and MFISTA 
have computationally simple steps and for a large H the main 
computational complexity lies within calculation of H'H, 
which can be efficiently calculated thanks to the separability 
shown in (49). As a result the speed of FISTA and MFISTA are 
improved much more significantly than PI and P2 especially 
when implemented in GPU. The computational speed of these 
algorithms is therefore comparable to Algorithm 1 and 2 but 
overall speed is slower due to faster iterations of ADMM 
formulation. 
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Fig. 5. Comparison of different algorithms in terms of time spent to reach 
A(fc) = 0.001 defined in (55) 



It should be noted that although the presented algorithms 
are faster, their implementation require a large memory due 
to simultaneously storing and using the SVD of H'H. This is 
not a significant drawback since systems with high memory 
and computational power are available in commercial config- 
urations. The optimization can also be separated along the 
horizontal axis when the temporal regularization is used, in 
order to perform the reconstruction sequentially to reduce 
the memory requirement if necessary, although this would 
reduce the acceleration gain. The calculation of the initial SVD 
of H'H is not included in any of the presented simulation 
results since it can be calculated independently along the 
horizontal and temporal axis in parallel resulting in only a 
small computational overhead especially when subsampling 
ratio is high and H'H has a lot of zero eigenvalues. 

VI. Conclusions 

In this paper two new ADMM based algorithms are pre- 
sented to solve the synthesis and analysis prior regularization 



problem for compressed sensing in parallel MRI. The proposed 
algorithms make use of the ADMM formulation to provide 
faster convergence rate than other state of the algorithms 
such as FISTA, MFISTA, SALSA or traditional Split-Bregman 
Method. A computational method to improve the speed of the 
proposed algorithms is also presented which makes use of 
the separability of the transfer function in Cartesian sampled 
MRI, leading to faster convergence than other ADMM based 
methods. The performance of the proposed algorithms are 
shown to be significantly faster than other methods for a highly 
undersampled dynamic MRI application and the performance 
gap with the other algorithms is shown to increase when a 
parallel processing system is utilized such as GPUs. 

As a drawback of utilizing singular value decomposition 
of the transfer function to gain speed, implementation of 
presented algorithms require more memory than other methods 
and this requirement would increase with larger images or 
volumes. Although this does not present a problem for systems 
with large memory, it can still be overcome if needed by 
reconstructing the signal sequentially along the horizontal axis. 
The acceleration factor can easily be multiplied with the use 
of more advanced GPUs and better implementation on faster 
languages such as C/C++. 
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